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ABSTRACT 

This  paper  describes  an  algorithm  for  the  conversion  of  a 
grammar  in  the  form  of  a  set  of  BNF  productions  into  a  deterministic 
parsing  algorithm  as  described  by  a  set  of  modified  Floyd  productions. 
It  describes  the  implementation  of  a  recognizer  based  on  Floyd  produc- 
tions, including  optimization  of  the  recognizer  and  syntactic  error 
recovery.  A  complete  example  is  given  in  an  appendix  and  illustrations 
from  it  are  used  in  the  text. 


Computing  Reviews  Category  Numbers:  k.l   and  k.2 
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1.   Introduction 

Floyd  Production  Language  (FPL),  developed  by  Floyd  [l]  and 
later  modified  by  Evans  [2]  and  Feldman  [3],  has  often  been  used  to  specify 
the  syntax  of  a  language  for  writing  compilers.   An  FPL  syntax  specifica- 
tion of  a  language  £  is  a  direct  specification  of  a  (usually)  very  fast 
deterministic  parsing  algorithm  for  £,. 

The  specification  is  written  as  a  set  of  labeled  groups  of  FPL 
statements  (often  called  "productions",  although  "reductions"  would  be  more 
appropriate).   Many  minor  variations  on  the  form  of  the  statements  and  on 
the  mnemonic  meanings  of  the  labels  have  been  used.   The  form  used  in  this 
paper  is  influenced  by  the  fact  that  the  statements,  and  the  labels  to 
them,  are  generated  automatically.   The  notation  used  will  be  described 
as  it  is  introduced.  An  FPL  statement  will  have  the  form: 


1' 


a 


P 


-N 


\ 


\ 


where 


(a)  L1= 

(b)  a 

(c)  p 

(d)  ->N 


(optional)  labels  the  group  L  of  1  or  more  FPL  statements; 

is  a  string  of  n  symbols  which  is  to  be  compared  with  the 
top  n  symbols  of  the  recognition  stack; 

(optional)  is  a  string  of  m  symbols  to  be  compared  with 
the  next  m  symbols  on  the  input  string; 

(optional)  means  reduce  the  string  a   in  the  recognition 
stack  to  the  single  nonterminal  symbol  N; 


*-N  (optional)  means  place  the  marker  symbol  N  on  the  marker 
stack  to  indicate  that  the  nonterminal  symbol  N  is  being 
sought; 

(optional)  is  a  call  on  the  lexical  analyzer  (scanner) 
to  place  the  next  symbol  from  the  input  string  into  the 
recognition  stack; 

(g)  Lp     is  the  label  of  the  next  group  of  statements  to  be 
executed; 

(h)        if  the  comparisons  in  (b)  and  (c)  are  unsuccessful,  the 
next  statement  in  sequence  (in  the  same  group)  is 
executed; 

if  there  are  no  statements  left  in  the  group,  a  syntactic 
error  has  occurred. 

An  example  of  a  36  statement  FPL  syntax  specification  of  a 
simple  language  is  given  in  the  Appendix. 

Most  FPL  specifications  have  been  hand  coded.  While  this  is  not 
a  particularly  difficult  task  the  syntax  definition  so  obtained  is  of  little 
benefit  to  programmers  who  wish  to  program  in  £.  They  would  prefer  to  work 
from  a  BNF  specification  of  £.  Ear ley  [k]   and  DeRemer  [5]  have  proposed 
schemes  to  convert  BNF  productions  into  FPL  statements.  The  algorithm 
described  in  this  paper  is  an  extension  and  implementation  of  DeRemer 's 
algorithm. 

The  use  of  special  markers  (bar  symbols)  in  an  auxiliary  stack 
facilitates  the  automatic  generation  of  syntactic  error  recovery,  as  well 


as  minimizing  the  number  of  FPL  statements  which  must  be  matched.  Also, 
reductions  to  nonterminal  symbols  are  allowed  only  in  the  top  of  the 
recognition  stack,  thereby  further  reducing  the  number  of  stack  compari- 
sons required.   The  algorithm  described  below  enables  very  fast  and 
efficient  recognizers  for  many  BNF  grammars  to  be  generated  automatically. 
It  has  been  used  successfully  to  generate  parsers  for  TRANQUIL  [6],  a 
language  for  specifying  array-type  algorithms,  as  well  as  several  other 
languages  being  developed  on  the  ILLIAC  IV  project. 


2.   The  Basic  Algorithm 
Consider  a  language  £  to  be  defined  by  a  grammar 


G  =  (VT,  VN,  S,  P), 


where 


V  •■■■   the  set  of  terminal  symbols  of  £  (represented  by  lower  case 
Latin  letters  and  the  system  supplied  end  marker  J_); 

VN  =  a  set  of  nonterminal  symbols  (represented  by  upper  case 
Latin  letters); 

SeV  is  the  objective  symbol; 

P  is  a  numbered  set  of  BNF  production  rules  defining  £  with 
Z::  =  J_sJ_. 

Appendix  B  is  a  sample  BNF  grammar  with  both  intermediate  and 
final  results  of  the  application  of  the  algorithm.  The  examples  used  in 
this  and  the  following  chapters  are  chosen  from  this  appendix.  Some  of 
the  examples  used  will  differ  slightly  from  the  appendix  but  in  each  case 
the  difference  will  be  explained  as  further  details  of  the  algorithm  are 
described. 

Define  XeV     V  to  be  a  head  symbol  of  N  eV  if  there  exist 
BNF  product ir 


Nl 

•  •  — 

N2 

• 

"a 

:  :  = 

N3 

• 

N 
n 

:  :  = 

X    . 

• 

for  n  >  1. 


a) 

M   : 

:=  cm  ... 

b) 

M   : 

:=  art   . . . 

c) 

M   : 

:=  a 

The  following  formal  rules  for  converting  from  BNF  productions 
to  FPL  statement s,    for  determining  the  formal  labels  of  groups  of  FPL 
statements,  and  for  determining  the  statements  which  should  constitute 
each  group  were  developed  by  DeRemer  [5]. 

If  a  represents  the  string  made  up  of  the  first  n  symbols  on 
the  right  of  BNF  production  n,  then  the  following  BNF  to  FPL  mapping 
rules  apply: 

BNF  Production  FPL  Statement 

a  I  *Nh 

a  I  *t(n,  n+l) 

a|-»M|        Mt 

In  these  FPL  statements  the  string  ex   is  to  be  compared  with  the  top  n 
symbols  in  the  recognition  stack;  *  denotes  scan  another  symbol  from  the 
source  program  into  the  stack;  -*M|  means  replace  the  string  ex   in  the  stack 
by  the  nonterminal  symbol  M.  The  symbols  on  the  right  are  labels  of  the 
next  group  of  FPL  statements  to  be  executed  after  complete  execution  of 
those  productions;  Nh  identifies  a  group  of  statements  which  attempts  to 
locate  an  initial  (head)  symbol  of  N  in  the  stack;  t(n,  n+l)  labels  the 
statement  group  which  attempts  to  find  a  terminal  t  in  the  top  of  the 
stack  corresponding  to  the  t  in  position  n+l  of  BNF  production  number  jt, 
Mt  labels  the  group  which  attempts  to  match  constructs  with  an  M  at  the 
top  of  the  stack.  For  example,  BNF  production  1 

£::  -Js! 

converts  to  FPL  productions  1,  ik   and  15: 


*Nh-S 
IS         *t(l,  3) 
J_SX     -z|   Nt-Z 
for  n  =  1,  2  and  3,   respectively. 

Define  X  (n)  to  be  the  n   symbol  on  the  right  side  of  BNF  pro- 
duction jt.  Then  the  following  rules  determine  which  labels  (groups  of 
FPL  statements)  must  exist. 


(a)  For  each  NeV : 

label  Nh  exists  if  3  it,   n:  N  =  X  (n),  n  >  1 
e.g.,  for  S  but  not  for  D  in  the  example . 

(b)  For  each  NeV  : 

label  Nt  exists  if  3  Jt,  n:  N  =  X  (it),  n  >  0 

i.e.,  for  all  elements  of  V  except  Z  in  the  example. 

(c)  For  each  teV ,: 

label  t(jr,  n)  exists  for  each  jt,  n:  t  =  X  (jt),  n  >  1 
e.g.,  for  x  but  not  d  in  BNF  production  7 
D  ::=  dEx 


The  next  step  is  to  determine  the  set  of  FPL  descriptors  for  each  FPL  group 
which  must  exist,  where  each  descriptor  will  correspond  to  (normally)  an 
FPL  production.  The  three  rules  are: 

(a)        {(it,  1)|X  (jt)eV  and  the  left  side  of  BNF  production 
it  is  a  head  symbol  of  N} 
e.  g.,  D^g  -  [(11,  1),  (12,  1),  (13,  1)}. 


(b)  DNt  =  {(«,  n)  |  N  =  Xn(«),  n  >  0,  all  *} 
e.g.,  DNt_c  =  {(3,  2),  (6,  1)) 


<C)   Dt(n,  n)  -U*>    ^ 


e*S-  Dt(l,  3)  =  C(1'  3)} 


By  determining  the  group  labels,  then  the  descriptors,  and  apply- 
ing the  mapping  rules  to  the  set  of  BNF  productions,  it  is  always  possible 
to  obtain  an  equivalent  set  of  FPL  statements  which  may  be  used  as  a  syntax 
recognizer  for  the  language  specified  by  the  BNF  grammar.   Unfortunately, 
this  recognizer  will  usually  be  nondeterministic.   To  make  it  deterministic, 
FPL  statements  within  a  group  may  have  to  be  reordered,  and,  when  two  state- 
ments mutually  preclude  each  other  (the  placement  of  either  before  the  other 
will  always  preclude  execution  of  the  second  one),  expansion  of  syntactic 
context  in  one  or  both  of  them  may  be  necessary.   In  theory  this  can  always 
be  done,  but  in  practice  it  is  necessary  to  obtain  determinism  in  a  way 
which  minimizes  the  expansion  of  context. 


8 
3.   Rules  to  Make  the  Algorithm  Deterministic 

simple  example  of  mutual  preclusion  is  obtained  when  the  BNF 
grammar  has  productions  of  the  form: 

1:   N  ::=  ab... 

2 :   N  : :  =  ac . . . 

which  lead  to  the  creation  in  the  Nh  group  of  the  FPL  statements: 

a|  *t(l,  2) 

a|  *t(2,  2) 

i.e.,  statements  which  have  identical  stack  comparison  strings, 

e.g.,   Nh-E:      f |      -  E|     Nt-E 

f|  *t(l2,  2) 

f|  *t(l3,  2) 

A  first  attempt  at  resolution  of  this  problem  is  to  combine  the  destination 
group  labels  of  the  offending  statements  into  a  single  combined  group  label, 
combine  the  groups  into  a  single  group  with  that  label,  and  replace  the  FPL 
statements  by  a  single  statement.  No  combination  of  destination  groups  is 
possible  when  one  of  the  precluding  statements  involved  is  of  the  type 

a  |     -*  N  |   Nt 

ie  reductions  are  not  delayed.  In  this  case  an  expansion  of  context 
will  be  necessary. 


The  following  revisions  are  made  to  the  BNF  grammar  prior  to 
conversion  to  FPL  statements: 

(a)  Two  BNF  productions  of  the  form 
A  ::  =  CUB. .. 
and  C  : :  =  CCt . . . 
are  changed  to 


A 

J 

= 

CUB. 

•    • 

C 

: 

:  = 

CUD. 

•    • 

D 

. 

= 

t 

(h)  Two  BNF  productions  of  the  form 
A  ::=  ON. .. 
and  B  : :=  oM. . . , 
where  M  is  a  head  symbol  of  N,  are  changed  to 
A  : !=  ON 
B  : :  =  aD 
D  ::=  M 
e.g.,  BNF  production  3 
S  : : =  aCqr 
becomes  BNF  productions  3  and  18 
S  : : =  aZqr 
Z  ::=  C 
since  the  C  of  production  3  is  a  head  symbol  of  the  B  of 
production  2. 
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te  above  revisions  have  "been  made  to  the  BNF  grammar,  and 
neither  of  two  precluding  FPL  statements  involves  a  reduction,  then  the 
sta*       will  be  of  the  form: 

a|  *Nh 

and  a|  *Mh 
or 

a|  *t(it-,  m) 

and  a|   *t(it2>  m) 
where  m  =  n  +  1.  These  can  be  combined  into 

a|   *ch(p) 
or 

a|   *ct(p) 
where  the  ch(p)  and  ct(p)  groups  of  statements  are  the  union  of  the  Nh  and 
Mh  groups,  and  the  t(n  ,  m)  and  t(n  ,  m)  groups,  respectively,  and  p  is 
described  below. 

e.g.,  FPL  production  2 

Nh-S    a|         *ch(l) 
is  a  result  of  combining 

a |         *Nh-B 
a|  *Nh-Z 

which  arise  from  the  descriptors  (2,  l)  and  (3>  l)  for  Nh-S. 
When  a  production  of  the  form: 

a\       *m 

corresponding  to  the  descriptor  (n,   n)  is  executed  a  special  marker  (bar 
symbol)  denoted  by  N(n,  m)  is  pushed  into  a  separate  bar  symbol  stack. 
Th-      ^presented  in  the  appropriate  FPL  statement  by: 
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a|-N(jt,  m)|*Nh 

If  the  statements,  arising  from  descriptors  (it,,  n)  and  (jtp,  n)  are  identi- 
cal then  a  special  bar  symbol  N*(p)  is  used  in  a  single  combined  statement, 
where  p  is  a  pointer  to  the  bar  symbols  N(it  ,  m)  and  N(jtp,  m)  (there  may  be 
more  than  two).  A  combined  statement  involving  several  different  nonter- 
minal symbols  is  represented  by 

a|«-(p)  |  *ch(p), 

where  p  is  a  pointer  to  a  list  of  the  bar  and  special  bar  symbols  involved, 

e.g.,   FPL  productions  1,  2  and  26 

]_  |  «-S(l,  2)  |  *Nh-S 

a  |  «-  (1)    |  *ch(l) 

d  |  -E*(2)    |  *Nh-E 

The  FPL  statements  in  each  Nt  group  are  classified  as  type  a  or 
type  b,  according  as  their  descriptors  (jt,  n)  have  n  =  1  or  n  >  1,  respec- 
tively. Only  type  a  statements  are  relevant  in  the  syntax  analysis  if  the 
symbol  at  the  top  of  the  bar  symbol  stack  is  not  N,  or  a  pointer  (p)  to 
a  symbol  list  containing  N,  because  a  terminal  head  symbol  is  always  sought 
first  to  begin  the  construction  of  a  nonterminal  and  a  bar  symbol  is  not, 
therefore,  pushed  onto  the  bar  stack  for  any  nonterminal  head  symbol,  i.e., 
the  latter  is  never  explicitly  looked  for.  Thus  the  type  a  FPL  statements 
form  a  subgroup,  labeled  Nta.  The  Nta  subgroup  is  void  if  N  is  never  a 
head  symbol  of  a  BNF  definition  of  a  nonterminal  other  than  itself; 

e.g.,  Nta-C:   c|j        *t(6,  2) 
C I  -Z      Nt-Z 
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If,  after  a  reduction  to  N,  the  top  symbol  in  the  bar  stack  is 
N(n.    or  a  pointer  (p)  to  a  symbol  list  containing  N(n,  n)  then  either 
a  left  recursive  production  (to  continue  reduction  to  N)  or  the  FPL  state- 
ment with  descriptor  (jt,  n)  applies  next.  Thus  an  Ntb(n,  n)  subgroup 
which  includes  the  FPL  statements  arising  from  the  recursive  BNF  produc- 
tion defining  N,  followed  by  a  statement  to  remove  the  top  symbol  from  the 
bar  stack,  followed  by  the  statement  with  descriptor  (n,  n)  is  generated. 
If  the  top  bar  symbol  is  N*(p),  or  if  this  symbol  is  included  in  a  combined 
list  at  the  top  of  the  stack,  then  a  combined  subgroup  cNtb(p)  is  generated 
in  like  manner  but  with  the  several  statements  determined  by  the  descriptors 
in  the  list  pointed  to  by  (p)  being  placed  after  the  bar  removal  statement; 

e.g.,  Ntb(9,  2):    F|         *t(lO,  2) 

pop  bar  stack 
IF  |   -D |   Nt-D 

The  combination  rules  described  above  may  now  be  applied  in  a 
Nta  subgroup,  in  the  recursive  part  of  a  Ntb  or  cNtb  subgroup,  and  in  the 
nonrecursive  part  of  a  cNtb  subgroup.  No  combination  is  allowed  between 
recursive  and  nonrecursive  statements  in  any  subgroup. 

With  this  subgrouping,  transfer  to  the  group  label  Nt  becomes  a 
dynamic  transfer,  DNt,  to  Nta,  a  Ntb(jt,  n)  or  a  cNtb(p)  subgroup  depend- 
ing on  the  DNt  symbol  currently  at  the  top  of  the  bar  stack; 

e.g.,  FPL  production  21 

iF|   -D|   DNt-D 

If  preclusions  still  exist  after  the  subgrouping  and  combining 
:ribed  above  then  contextual  expansion  is  required.  A  great  deal  of 
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information  concerning  the  symbols  preceding  the  string  a  in  the  stack 
comparison  part  of  an  FPL  statement  is  implicit  in  the  grouping  of  the 
statement.  Therefore,  only  right  context  expansion  (lookahead)  is  employed. 
This  is  done  by  generating,  for  each  statement  involved  in  the  preclusion, 
all  possible  strings  (up  to  length  k)  of  terminal  symbols  which  may  follow 
the  a  in  the  stack  comparison  part.   If  all  such  strings  are  different  for 
the  involved  statements  the  preclusion  has  been  eliminated.  Experience  has 
shown  that  a  lookahead  of  k  =  1  symbol  is  usually  sufficient  to  eliminate 
preclusions  and  no  practical  examples  have  been  found  where  a  finite  look- 
ahead  of  more  than  k  =  3  symbols  is  necessary  to  resolve  a  preclusion* 
Thus,  when  contextual  analysis  is  necessary,  a  one  symbol  lookahead  is 
generated.  If  this  fails  to  differentiate,  a  three  symbol  lookahead  is 
generated.   If  this  also  fails  to  resolve  preclusions  "the  attempt  to  obtain 
a  deterministic  FPL  recognizer  from  the  given  BNF  grammar  is  terminated. 
The  lookahead  for  a  particular  FPL  production  need  only  be  enough  to  differ- 
entiate it  from  any  following  productions  which  it  precludes.  Thus  the 
last  of  a  set  of  precluding  productions  needs  no  lookahead; 

e.g.,  Nh-E:   f|X    -»e|   DNt-E 


*ct(*0 


Ik 


Optimization  of  Interpretive  Instructions 

e  construction  of  a  syntactic  analyzer  out  of  the  FPL  state- 
ments involves  the  construction  of  a  string  of  operators  and  operands  which 

either  interpret ively  executed  or  converted  to  an  ALGOL  program  to 
be  compiled  and  then  executed  directly.  The  operators  fall  into  seven 
classes:  pointer  initializing,  recognition  stack  tests,  lookahead  queue 
tests,  recognition  stack  manipulation,  transfer  of  control,  semantic  routine 
calls,  and  error  recovery,  a  complete  list  of  which  is  given  in  Appendix  A. 

Recognition  Stack  Tests 

In  the  Nta,  Ntb(jt,  n)  and  cNtb(p)  type  subgroups  no  recognition 
stack  tests  are  necessary.  In  the  Nta  type  subgroups,  such  as  Floyd  pro- 
duction number  12: 

Nta-D:    D|  *  ct(3) 

the  string  a  consists  of  the  single  symbol  D  which  is  put  there  immediately 
before  transfer  to  this  group  either  by  production  number  21: 

IF |       -D  |   DNt-D 

or  by  production  number  28: 

i|       -D  |   DNt-D 

the  Ntb(n,  n)  and  cNtb(p)  type  groups  the  string  a   is  of  the  form  PN, 
where  0  is  of  length  n,  n  >  1,  for  example,  Floyd  production  number  21 
above.   In  th      <3,  the  presence  of  the  string  p  =  "i"  is  verified  just 
before  the  symbol  N(n,  n)(=  "F(9,  2)")  is  pushed  into  the  bar  stack,  in 
production  number  27: 
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m 

k      *-F(9,  2)  |    *  Nh-F 

I 


and  a  transfer  is  made  to  the  Nh-F  group  to  begin  seeking  the  constituents 
of  F.  When  F  has  been  found,  the  recognition  stack  contains  iF,  the  top 
symbol  in  the  bar  stack  is  then  F(9,  2),  and  control  is  transferred  to  the 
type  b  subgroup,  Ntb(9,  2). 

In  the  Nh,  and  ch(p)  type  groups,  a  terminal  head  symbol  of  a  non- 
terminal is  sought.  Hence,  the  a's  all  consist  of  single  terminal  symbols, 
as  in  Nh-F: 

k  |  ->F    |    DNt-F 

l\  -»F    |    DNt-F 

m  |  -»F    |    DNt-F 

In  the  ct(p)  type  groups  the  a  in  the  stack  is  of  the  form  pt  where  the  £ 
was  recognized  in  the  previous  production  and  the  t  is  the  symbol  scanned 
just  before  the  transfer  to  this  group.  For  example,  ct(5): 

dEx  |  -*C    |   DNt-C 

dEy  |  -»C    |    DNt-C 

where  the  "dE"  has  been  recognized  in  production  30: 
dE  |  *ch(5) 

Hence,  in  all  three  of  these  type  groups  it  is  sufficient  to  test  only  the 
top  (terminal)  symbol  of  the  recognition  stack. 

The  statements  in  each  of  these  groups  are  ordered  before  they 
are  processed  so  that  all  those  with  identical  stack  comparisons  (differ- 
entiated by  lookahead)  are  together.   The  symbol  at  the  top  of  the  recogni- 
tion stack  is  tested  on  the  first  statement  with  an  instruction  which,  upon 
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failure,    transfers  to  the  beginning  of  the  next  statement  with  a  different 

:  ol  at  the  top  of  the  stack.     No  stack  test  is  made  on  the  statements 
in  between.     An  example  is  group  ch(l): 

d  |  -E*(2)      |        *  Nh-E 


m 


-F(9,    2)|        *  Nh-F 
I 

-D  DNt-D 


i  the  test  for  "i  in  the  third  production  is  not  made.  Failure 
in  the  stack  test  in  the  second  production  causes  a  transfer  over  the  third 
production  to  the  end  of  the  group;  failure  in  the  lookahead  test  in  the 
second  production  causes  a  transfer  to  the  next  (third)  production.  In  the 
event  that  several  statements  have  different  stack  comparisons  but  all  take 
the  same  actions  (same  semantic  routine  calls  and  same  recognition  stack 
reduction),  a  mode  pattern  is  built  with  a  bit  on  for  the  stack  symbol  of 
each  statement.  One  instruction  is  produced  for  the  stack  tests  of  all  of 
these  statements.   It  simply  checks  to  see  if,  in  the  appropriate  row,  the 
bit  corresponding  to  the  top  stack  symbol  is  on.  The  Nh-F  group  mentioned 
above  is  an  example  of  this. 

Since  the  t(n,  m)  groups  are  identical  in  basic  form  to  the  ct(p) 
groups  they  are  handled  in  the  same  way  as  the  preceding  except  that  there 
is  always  only  one  statement  in  each  group  and  only  one  transfer  to  each 
group  (which  yields  a  further  optimization  to  be  discussed  later). 

Lookahead  Contextual  Analysis  Tests 

These  tests  follow  the  recognition  stack  tests  whenever  needed 
and  are  implemented  with  three  main  types  of  instructions: 


IT 


(1)  if  the  right  symbol  is  present,  increment  the  lookahead 
level  pointer  and  go  on  with  next  instruction,  otherwise 
branch  to  another  instruction; 

(2)  if  the  right  symbol  is  present,  branch  to  another 
instruction,  otherwise  go  on  with  the  next  instruction; 

(3)  if  the  right  symbol  is  present,  go  on  with  next 
instruction,  otherwise  branch  to  the  beginning  of 
the  next  statement. 

The  address  of  the  next  statement  is  set  in  a  global  location  just  before 
the  string  of  lookahead  test  instructions.  Each  of  the  last  two  types 
include  a  bit  pattern  test  like  that  used  in  the  stack  tests  above.  To 
optimize  the  lookahead  test  the  lookahead  strings  are  ordered  as  in  the 
following  example: 

given  strings  ordered 

cef  a 

cd  b 

cij  cd 

cik  cef 

a  ceg 

ceg  ceh 

b  cij 

cil  cik 

ceh  cil 
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The  following  f igure  shows  how  the  above  lookahead  strings  would  be  marked 
for  testing  with  subscripts  indicating  which  of  the  three  types  of  test  is 
needed  for  each  symbol.   Subscript  0  indicates  that  no  test  is  needed,  and 
y  and  z  are  the  bit  patterns  formed  for  this  example: 


0 


"2 
l3j 


a,  b  €  x 
f ,  g,  h  €  y 

j,  k,  1  e  z 


Let  XLB  be  the  instruction  type  1  (branch  on  failure),  XLA  be  type  2 
(branch  on  success),  and  XLL  be  type  3  (branch  to  next  production  on 
failure),  with  a  suffix  "B"  meaning  a  bit-pattern  test,  and  L.  represent- 
ing a  transfer  to  label  L.  as  indicated  by  the  instruction  type.  Then  the 
following  is  the  list  of  instructions  performing  this  lookahead  contextual 
ana] 


LI: 


L2: 


XLAB 

(x,  L2) 

XLL 

(c) 

XLA 

(d,  L2) 

XLB 

(e,  LI) 

XLLB 

(y) 

GOTO 

(L2) 

(i) 

XLLB 

M 

(rest 

of  this  Floyd  production) 
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Transfer  of  Control 

Some  additional  optimization  can  be  applied  if  the  symbol  follow- 
ing a  is  a  terminal  which  is  not  combined.  The  fact  indicated  earlier, 
namely,  that  there  is  always  only  one  statement  in  a  t(n,  m)  group  and  only 
one  transfer  to  it,  means  that  there  is  no  need  to  generate  a  transfer. 
Therefore,  the  next  symbol  is  scanned  and  the  t(it,  m)  statement  is  created 
immediately  as  if  it  were  part  of  the  same  statement.  Floyd  productions 
l8  and  19  are  an  example: 

F  I  n  *     t(lO,  2) 

t(lO,  2):         Fn  |       -F   |  DNt-F 

Further,  if  a  lookahead  was  performed  then  the  symbol  was  successfully 
tested  in  the  lookahead  test  so  no  test  need  be  performed  in  the  t(rc,  m) 
statement.  The  next  symbol  is  scanned  and  the  recognition  stack  pointer 
is  incremented  without  a  test.  This  is  the  case  in  the  above  example.  If 
a  lookahead  of  at  least  k  symbols  is  required,  then  this  is  done  for 
t(jt,  n  +  i)  (l  <  i  <  k)  statements,  if  they  exist. 

Several  other  minor  optimizations  involving  stack  reduction  and 
transfer  of  control  have  been  implemented,  as  indicated  by  the  description 
of  the  parser  operators  given  in  Appendix  A. 


20 


5.   Error  Recovery 

At  the  end  of  each  group  there  is  an  error  statement  which  is 
applied  if  every  statement  in  that  group  failed  to  match.  The  general 
error  recovery  technique  used  has  been  to  look  at  the  top  symbol  in  the 
bar  symbol  stack,  reduce  the  stack  to  the  nonterminal  named  in  the  case  of 
an  N  or  N*,  skip  the  input  to  the  first  symbol  which  can  follow  N,  and 
transfer  to  the  Ntb  group  indicated  by  the  bar  symbol.   If  the  bar  symbol 
is  a  combined  bar  symbol,  then  the  input  is  scanned  for  the  first  symbol 
that  can  follow  any  of  the  bar  symbols  in  the  combined  group.  When  one 
is  found  the  bar  symbol  it  follows  is  treated  as  above.  This  implies  the 
existence  of  a  table  which  gives,  for  each  occurrence  of  a  nonterminal, 
a  table  of  terminal  symbols  which  may  immediately  follow  that  nonterminal 
occurrence.  This  table  actually  includes  terminal  occurrences  also,  as 
will  be  seen  later,  and  is  generated  in  the  same  manner  as  the  lookahead 
strings  are  generated. 

It  sometimes  happens  that  the  symbol  in  error  is  first  encoun- 
tered in  a  lookahead  test.  This  can  cause  the  appropriate  FPL  statement 
to  be  skipped  in  favor  of  a  wrong  one.  The  parse  is  directed  down  a 
wrong  path  and  several  reductions  sometimes  can  be  made  and  bar  symbols 
popped  before  the  error  is  detected.  This  causes  a  greater  portion  of  the 
input  to  be  skipped  than  if  all  the  bar  symbols  had  been  retained. 

An  ALGOL  example  of  this  problem  would  be  the  following: 
Suppose  an  arithmetic  expression  in  an  assignment  statement  contains  an 
incorrect  exponentiation  operator.  Then  when  it  becomes  the  next  input 


symbol,  the  bar  stack  will  contain  primary,  factor,  term,  arithmetic 


expression,  statement,  compound  tail,  program.  The  lookaheads  needed  to 
decide  if  the  end  of  each  construct  has  been  reached  could  be  ordered  in 
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such  a  way  that  the  reduction  is  always  made  and  the  bar  symbol  popped 
until  the  END  following  the  last  statement  is  needed,  and  inserted,  to  make 


the  compound  tail  until,  finally,  only  the  program  symbol  is  left.  Then  the 
input  is  scanned  to  the  first  symbol  following  program,  namely  end-of-file 
mark,  i.e.,  the  rest  of  the  source  string  is  skipped;  whereas,  if  the 
primary  were  the  top  bar  symbol  only  a  few  symbols  to  the  next  operator, 
";  ",  END,  ELSE,  etc.  would  have  been  skipped. 

In  order  to  avoid  this  problem  an  additional  test  is  made  in  any 
group  where  lookahead  testing  is  needed  to  check  whether  the  next  symbol  is 
in  the  set  of  symbols  which  can  occur  at  that  point.   If  it  cannot,  then  the 
general  error  recovery  scheme  is  called  immediately. 

There  are  many  situations  which  have  special  features  and  allow 
for  more  specialized  error  recovery  than  that  outlined  above.  A  discus- 
sion of  these  now  follows. 

The  t(jt,  m)  type  of  statement  is  executed  when  the  previous  history 
of  the  parse  leaves  no  choice.   Since  there  is  only  one  production  and, 
therefore,  only  one  possibility  for  the  top  symbol  in  the  stack  it  may  be 
inserted  by  the  following  insertion  rules  if  it  is  not  there.  The  parse 
then  may  proceed  as  if  no  error  had  occurred;  no  error  production  is 
required. 

Rules  for  Insertion 

Assume  the  following  symbolism:  £  =  the  part  of  the  stack  below 
the  top  symbol,  7  =  the  unscanned  portion  of  the  input  after  the  next  symbol, 
a  =  the  symbol  that  is  sought  by  the  test,  b  =  any  symbol  which  can  imme- 
diately follow  this  occurrence  of  a,  c  =  any  symbol,  |  =  the  top  of  the 
stack  (the  stack  to  the  left,  and  the  unscanned  input  to  the  right, 
-"  2»  =  if  the  situation  to  the  left  of  the  arrow  holds,  then  change  it 
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to  the  situation  on  the  right.  The  following  are  the  rules;  the  first  one 
from  the  top  that  applies  is  the  one  that  is  used: 

P  b  |  a  7  >  0  a  |  b  y 

3  b  |  c  7  =*.  3  a  |  b  c  7 

P  c  ,  a  7  -  >  P  a  I  7 

3  cj  c27  =^  p  a  |  c27 

The  second  special  case  occurs  when  an  Nh,  ch(p),  or  ct(p)  group  consists 
only  of  statements  all  of  which  contain  the  same  symbol  in  the  stack  com- 
parison field;  then  the  stack  test  can  insert  the  symbol  if  it  is  not  there, 
using  the  above  rules  for  insertion  as  in  the  t(«,  m)  statements.  The 
error  statement  is  not  then  needed  since  the  parse  will  continue  in  the 
same  way  regardless  of  the  outcome  of  the  stack  comparison. 

The  next  special  case  is  that  of  a  group  in  which  one  statement 
has  a  stack  symbol  of  a  character  or  special  word  whereas  all  the  others 
of  that  group  have,  as  stack  comparison  symbol,  a  terminal  class  symbol 
(identifier,  number,  or  string).   In  this  case  the  assumption  is  made  that 
the  error  is  far  more  likely  to  have  occurred  with  the  specific  terminal 
symbol  than  with  a  terminal  class  symbol.  The  error  statement  here  inserts 
the  specific  terminal  symbol  according  to  the  rules  for  insertion  and  trans- 
fers back  to  the  appropriate  place  in  the  corresponding  statement. 

The  last  special  case  applies  when  all  the  statements  of  a  group 
make  the  same  reduction.   In  this  case  that  reduction  is  made  anyway.  The 
top  symbol  of  the  stack,  which  didn't  match  any  of  the  stack  tests,  is  put 
back  into  the  input  queue  if  it  can  follow  the  nonterminal  to  which  the 
stack  is  reduced.  The  Nh-F  group  is  an  example: 
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k  |  -F  DNt-F 
I  |  -F  |  DNt-F 
m  |        -*F     |   DNt-F 

Note  that  no  error  statement  is  needed  in  the  Nta,  Ntb(jt,  m) 
and  cNtb(p)  type  subgroups  because  no  stack  test  is  made  in  these  groups 
and  the  last  production  requires  no  lookahead  test  so  it,  at  least,  will 
match. 


2k 


6.   Conclusion 

The  FPL  parsing  algorithm,  not  surprisingly,  has  similarities  to 
precedence  parsing  algorithms  in  that  the  three  different  possible  stack 
actions  do  nothing,  -N,  -»N,  which  can  be  specified  in  a  FPL  statement, 
correspond  to  the  three  precedence  operators  =,  <* ,     and  •>,  respectively. 
The  conversion  algorithm  is  rather  slow  compared  with  some  other  algo- 
rithms but  has  the  advantage  that,  with  one  exception,  it  is  able  to  make 
use  of  more  context  than  is  employed  in  precedence  schemes  in  determining  . 
the  bounds  of  the  phrase  next  to  be  reduced.  A  more  general  error  recovery 
capability  than  that  usually  associated  with  precedence  techniques  is 
included  in  the  algorithm. 

Careful  consideration  of  rather  obvious  optimizations  in  the 
information  included  in  the  FPL  statements  has  been  reflected  in  the  FPL 
statement  generation  algorithm,  thereby  enabling  the  production  of  highly 
efficient  syntax  recognizers.  Representation  of  the  basic  parser  inter- 
preter instructions  as  hardware  instructions,  or  at  least  as  microprogrammed 
sequences,  would  further  enhance  overall  compiler  performance.   Interpretation 
of  syntax  tables  also  can  be  avoided  by  directly  coding  the  FPL  statements 
in  a  higher  level  language  for  machines,  such  as  the  B5500  and  B65OO, 
which  have  a  suitably  matched  software -hardware  capability. 
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Mnemonic 


LLVL 


ILVL 
XSBS 


XSBT 


XSBB 


XSIS 


XSIT 


XLBS 


Appendix  A:   Parser  Operators 
Operands        Action 

Initialize  lookahead  buffer  test  level  pointer 

to  the  first  position 
Increment  the  recognition  stack  pointer 
S,A  Test  the  top  of  the  stack  for  symbol  S 

yes  =>  increment  stack  pointer 
no  =>  branch  to  address  A 
T,A  Test  the  top  of  the  stack  for  class  symbol 

type  T 

yes  =>  increment  stack  pointer 
no  =>  branch  to  address  A 
R,A  Test  the  top  of  the  stack  with  row  R  of  the 

pattern  array 

marked  =>  increment  stack  pointer 
not  marked  =>  branch  to  address  A 
S  Test  the  top  of  the  stack  for  symbol  S 

yes  =>  increment  stack  pointer 
no  =>  insert  S  at  top  of  stack  and 
increment  stack  pointer 
T  Test  the  top  of  the  stack  for  class  symbol 

type  T 

yes  =>  increment  stack  pointer 
no  =>  insert  a  symbol  of  type  T  at  top 

of  stack  and  increment  stack  pointer 
S,A  Test  the  input  queue  for  symbol  S 

yes  =>  increment  lookahead  level  pointer 
no  =>  branch  to  address  A 
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JCLBT  T,A 


ICLAS 


XLAI 


XLAB 


HIS 


XLLT 


XLLB 


REDl 


S,A 


T,A 


R,A 


N 
S 


Action 

Test  the  input  queue  for  class  symbol  type  T 

yes  =>  increment  lookahead  level  pointer 

no  =>  branch  to  address  A 
Test  the  input  queue  for  symbol  S 

yes  =>  branch  to  address  A 

no  =>  go  on 
Test  the  input  queue  for  class  symbol  type  T 

yes  =>  branch  to  address  A 

no  =>  go  on 
Test  the  input  queue  with  row  R  of  the 

pattern  array 

marked  =>  branch  to  address  A 

not  marked  =>  go  on 
Test  the  input  queue  for  symbol  S 

yes  =>  go  on 

no  =>  branch  to  address  in  NEXTm/foUCTI^N 
Test  the  input  queue  for  class  symbol  type  T 

yes  =>  go  on 

no  =>  branch  to  address  in  NEXTPRjfoUCTI/!)N 
Test  the  input  queue  with  row  R  of  the  pattern 

array 

marked  =>  go  on 

not  marked  =>  branch  to  address  in  NEXTPR,|6DUCTIj#N 
Subtract  N  from  the  recognition  stack  pointer 
Change  the  name  of  the  top  symbol  of  the 

recognition  stack  to  S 


Mnemonic 


REDK 


BPSH 
BP,0P 
TPSH 


EXEC 
XTSM 


XRSM 


Operands 
N,S 


Action 
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B 


N 
N 


N 


N/^P 

SKIP 

N 

GjbTjb 

A 

XBGfi 

A 

SETS 


Subtract  N  from  the  recognition  stack  pointer 

and  change  the  name  of  the  top  symbol  of 

the  recognition  stack  to  S 
Push  bar  symbol  B  into  the  bar  stack 
Pop  the  top  bar  symbol  from  the  bar  stack 
Put  the  next  input  symbol  into  the  recognition 

stack  at  location  of  the  recognition  stack 

pointer 
Execute  semantic  routine  N 
Execute  semantic  routine  N  and  test  global 

Boolean  SEMANTICTEST 

true  =>  go  on 

false  =>  branch  to  address  in  NEXTPRjZ!>DUCTl/>N 
Execute  semantic  routine  N  and  test  global 

Boolean  SEMANTICTEST 

true  =>  go  on 

false  =>  print  error  message  and  go  on 
Go  on 
Skip  N  characters  to  next  row  of  parser 

instruction  table 
Branch  to  address  A 
Test  top  stack  symbol  with  top  bar  stack 

symbol  (possibly  going  into  a  combined  group) 

match  =>  branch  to  address  in  bar  symbol 

no  match  =>  branch  to  address  A 
Put  A  in  NEXTPRj&)UCTI/!>N 
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•  -rands 


ERKE 


ERRN 


ERRR 


S,A 


S,A, 


Action 

Test  top  of  stack  with  next  input  symbol  to 

see  if  latter  can  follow  the  former 

yes  =>  go  on 

no  =>  execute  code  for  ERRR  instruction 
Print  error  message,  insert  terminal  syaibol  S 

at  top  of  stack  and  go  to  address  A 
Print  error  message,  reduce  stack  to  nonterminal 

symbol  S,  and  go  to  A 
Print  error  message,  recover  from  error  by 

using  top  bar  symbol 
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Appendix  B:   Conversion  of  a  Simple  BNF  Grammar 


(a)  The  BNF  productions: 


Production  number     1: 

Z 

:  :  = 

1*1 

2: 

S 

:  :  = 

a  B            | 

3: 

a  C  q  r 

k: 

B 

:  :  = 

D  b            | 

5: 

D  c            | 

6: 

c  j 

7: 

C 

•   •  — 

d  E  x        | 

8: 

d  E  y 

9: 

D 

:  :  = 

i  F            | 

10: 

i 

11: 

E 

:  :  = 

f                i 

12: 

f  g            I 

13: 

f  h 

Ik: 

F 

:  :  = 

F  n            | 

15: 

k                | 

16: 

I               | 

17: 

m 

A  dummy  nonterminal  symbol  Z  is  needed  at  (3*2)  so  production  3 
is  changed  and  18  is  added  as  follows: 


3: 
18: 


a  Z  q  r 
C 


(b)  FPL  statement  group  labels  and  descriptors 
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FPL  Statement  Group  Label          Descriptor  Set 

Nh-I  (start) 

(1,1) 

Nh-S 

(2,l)(3,l) 

Nh-E 

(11,1)  (12 ,1)  (13,1) 

Nh-F 

(15,1)  (16,1)  (17,1) 

Nta-Z 

exit 

Nta-C 

(6,1)(18,1) 

Nta-D 

(^,D(5,D 

Ntb(l,2) 

(1,2) 

Ntb(2,2) 

(2,2) 

Ntb(3,2) 

(3,2) 

Ntb(9,2) 

(lU,l)(9,2) 

ch(l) 

(7,l)(8,l)(9,D(lO,l) 

cNtb(2) 

(7,2)(8,2) 

ct(3) 

(^,2)(5,2) 

ct(U) 

(12,2)(13,2) 

ct(5) 

(7,3)(8,3) 

t(l,3) 

(1,3) 

t(3,3) 

(3,3) 

t(3,*0 

(3,10 

t(6,2) 

(6,2) 

t(l4,2) 

OM) 
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(c)  The  FPL  statements  generated: 

1.  Nh-£  (start):    jj        «-S(l,2)|  *     Nh-S 

^"HT   I  *     ch(l) 
x       -*E     I  DNt-E 


2. 

Nh-S: 

3- 

Nh-E: 

It. 

5. 

Nh-F: 

6. 

7- 

8. 

Nta-Z 

9- 

Nta-C 

10. 

11. 

12. 

Nta-D 

f 

"ly 


ot(k) 


k|  ->F                                    DNt-F 

t|  -»P  |                        DNt-F 

m|  -» F  |                       DNt-F 

success  exit 

C|j  *  t(6,2) 

t(6,2):  Cj|  ->B|                             DNt-B 

C|  -»Z  I                        DNt-Z 

D|  *  ct(3) 


13.  Ntb(l,2):  pop  bar  stack 

Ilk  j_s|  *         t(i,3) 

15.  t(l,3):  J_SJ_|  -»S|  DNt-E 

16.  Ntb(2,2):  pop  bar  stack 

17.  aB|  -»  S  I  DNt-S 

18.  Ntb(9,2):  F|n  *  t(10,2) 

19.  t(10,2):  Fn|  ->  f|  DNt-F 

20.  pop  bar  stack 

21.  iFl  -»D  DNt-D 


. 

23- 

2k. 

25- 

26. 
27. 
28. 


Ntb 


ch(l) 


pop  bar  stack 

aZ| 

t(3,3): 

aZq| 

t(3»: 

aZqr|            -* 

E   (2) 


t(3,3) 

*        t(3,k) 

DNt-S 

Nh-E 


m 

k  «-F(9,2)|  *  Nh-F 

I 


^D 


DNt-D 


29. 
30. 

31. 
32. 

33. 
3^. 

35. 
36. 


cNtb(2) 


Ct(3): 


ct(U) 


ct(5) 


pop  bar  stack 
dEl  * 


ch(5) 


Db| 

->B 

DNt-B 

Dc| 

-»B 

DNt-B 

fg| 

->  E 

DNt-E 

fh| 

->  E 

DNt-E 

dEx| 

-»  C 

DNt-C 

dEy| 

-»C 

DNt-C 
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(d)  The  FPL  parser  interpreter  instructions 


Nh-S: 


push  -L  into  recognition  stack 
TPSH 


GOTO 

(LI) 

LO: 

ERKR 

Nh-S: 

LI: 

XSIS 

(a) 

BPSH 

(1) 

TPSH 

GOTO 

(116) 

Nh-E: 

L2: 

XSIS 

(f) 

SETS 

(L3) 

LLVL 

XLLB 

(1) 

RED1 

(E) 

XBGO 

(LO) 

L3: 

TPSH 

GOTO 

(L2k) 

Nh-F: 

LU: 

XSBB 

(2,  L6) 

L5: 

RED1  (F) 

XBGO 

(LO) 

L6: 

ILVL 

ERRN 

(F,  L5) 

Nta-S: 

L7: 

success  exit 

Nta-C: 

L8: 

SETS 
XSLR 
LLVL 

(L9) 

XLLS 

(J) 

TPSH 

(t(6,2) 

0 

ILVL 

REDN 

(1,  B) 

XBGO 

(LO) 

L9: 

RED1 

(z) 

XBGO 

(LO) 

Nta-D: 

L10: 

TPSH 

GOTO 

(L21) 

row  1  of  pattern  array:  x,y 


row  2  of  pattern  array:  k,l,m 


3h 


Ntb(l, 

Lll: 

BPOP 
TPSH 

XSIS 

(1) 

REDN 

(2,  E) 

XBGO 

(L7) 

Ntl 

L12: 

BPOP 

REDN 

(i,s) 

XBGO 

(LO) 

Ntb(9, 

L13: 

SETS 
XSLR 
LLVL 

(LlU) 

XLLS 

(n) 

TPSH 

(t(l0,2): 

) 

ILVL 

NPOP 

(1) 

XBGO 

(LO) 

LIU: 

BPOP 

REDN 

(1,    D) 

XBGO 

(LIO) 

Ntbl     . 

L15: 

BPOP 
TPSH 

(t(3,3)0 

XSIS 
TPSH 

(q) 

(t(3,U):) 

XSIS 

(r) 

REDN 

(3,   S) 

XBGO 

(LO) 

L): 

Lll 

;bs 

(d,   L17) 

BPSH 

(E*(2)) 

TPSH 

TO 

) 

l: 

(i,  L19) 

(L18) 

,VL 

XL, 

PSH 

(F(9j 

TP1 

/no 

(l.k) 
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L18: 

L19: 

cNtb(2):  L20: 


ct(3) 


ct(4) 


ct(5) 


L21: 
L22: 

L23: 

L2U: 
L25: 

L26: 

L27: 

L28: 

L29: 


REDl 
XBGO 
ILVL 
ERRR 
BPOP 
TPSH 
GOTO 
XSBB 
REDN 
XBGO 
ILVL 
ERRN 
XSBB 
REDN 
XBGO 
ILVL 
ERRN 
XSBB 
REDN 
XBGO 
ILVL 
ERRN 


(D) 

(LIO) 


L27) 
3,  L23) 
1,  B) 
LO) 

B,  L22) 
k,   L26) 
1,  E) 
LO) 

E,  L25) 

1,  L29) 

2,  C) 

L8) 

(C,  L28) 


row  3  of  pattern  array:  t>,c 


row  k   of  pattern  array:  g,h 
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